Medical endoscope image recognition method and system, and endoscopic imaging system

ABSTRACT

A medical endoscope image recognition method is provided. In the method, endoscope images are received from a medical endoscope. The endoscope images are filtered with a neural network, to obtain target endoscope images. Organ information corresponding to the target endoscope images is recognized via the neural network. An imaging type of the target endoscope images is identified according to the corresponding organ information with a classification network. A lesion region in the target endoscope images is localized according to an organ part indicated by the organ information. A lesion category of the lesion region in an image capture mode of the medical endoscope corresponding to the imaging type is identified.

RELATED APPLICATIONS

This application is a continuation of International Application No.PCT/CN2020/087184, entitled “MEDICAL ENDOSCOPE IMAGE IDENTIFICATIONMETHOD AND SYSTEM, AND ENDOSCOPE IMAGE SYSTEM” and filed on Apr. 27,2020, which claims priority to Chinese Patent Application No.201910372711.4, entitled “MEDICAL ENDOSCOPE IMAGE RECOGNITION METHOD ANDSYSTEM, DEVICE, AND ENDOSCOPIC IMAGING SYSTEM” and filed May 6, 2019.The entire disclosures of the prior applications are hereby incorporatedby reference in their entirety.

FIELD OF THE TECHNOLOGY

Embodiments of this disclosure relate to the field of computerapplication technologies, including a medical endoscope imagerecognition method and system, and an endoscopic imaging system.

BACKGROUND OF THE DISCLOSURE

Various category identifications executed based on deep learning aregenerally important tools for solving classification for a large amountof data in various application scenarios. For example, in applicationscenarios such as image and natural language processing, large-scaleclassification and recognition of a large amount of data may beimplemented, so as to rapidly and accurately obtain a relatedclassification prediction result and accelerate the implementation offunctions in the application scenarios.

During classification prediction performed on images, for differentdeployed application scenarios, images for classification prediction andmethods for classification prediction are also different from oneanother. Taking Artificial Intelligence (AI)+a medical scenario as anexample, with continuously photographing in the alimentary canal by theendoscope, a large quantity of endoscope images are formed, theclassification prediction method is thus required to classify andrecognize the large quantity of endoscope images.

However, the related medical image has a single classificationprediction function, which cannot be adapted to the whole process ofphotographing by the endoscope for generating a medical endoscope videostream; moreover, since capturing the medical endoscope image would beunavoidably affected by switching and shaking of the endoscope, andduring photographing, a lens of the endoscope would unavoidablyencounter various liquids and foreign matters, the obtained endoscopeimage would often have a large amount of interference and noise,rendering weak robustness. Hence, it is expected to provide a method andsystem for recognizing a medical endoscope image, so that photographingby the endoscope in the alimentary canal can be adapted to the wholephotographing process and the robustness is relatively strong.

SUMMARY

To resolve the technical problems in the related art that classificationprediction of medical images cannot be adapted to the whole process ofcapturing medical endoscope images by an endoscope and the robustness isweak, embodiments of this disclosure include a medical endoscope imagerecognition method and system, an endoscopic imaging system, and medicalendoscope image recognition.

A medical endoscope image recognition method is provided. In the method,endoscope images are received from a medical endoscope. The endoscopeimages are filtered with a neural network, to obtain target endoscopeimages. Organ information corresponding to the target endoscope imagesis recognized via the neural network. An imaging type of the targetendoscope images is identified according to the corresponding organinformation with a classification network. A lesion region in the targetendoscope images is localized according to an organ part indicated bythe organ information. A lesion category of the lesion region in animage capture mode of the medical endoscope corresponding to the imagingtype is identified.

A medical endoscope image recognition system is provided. The medicalendoscope image recognition system includes processing circuitryconfigured to receive endoscope images from a medical endoscope, andfilter the endoscope images with a neural network, to obtain targetendoscope images. The processing circuitry is configured to recognizeorgan information corresponding to the target endoscope images via theneural network, and identify an imaging type of the target endoscopeimages according to the corresponding organ information with aclassification network. Further, the processing circuitry is configuredto localize a lesion region in the target endoscope images according toan organ part indicated by the organ information; and identify a lesioncategory of the lesion region in an image capture mode of the medicalendoscope corresponding to the imaging type.

A machine device is provided, including a processor and a memory. Thememory stores computer-readable instructions, the computer-readableinstructions, when executed by the processor, can implement theforegoing medical endoscope image recognition method.

A non-transitory computer-readable storage medium is provided. Thenon-transitory computer-readable storage medium stores instructionswhich when executed by a processor cause the processor to perform theforegoing medical endoscope image recognition method.

An endoscopic imaging system is provided. The endoscopic imaging systemincludes the medical endoscope image recognition system, and a displaydevice configured to display the endoscope images.

An endoscopic imaging system is provided. The endoscopic imaging systemincludes a display device for a medical endoscope video and a workstation. The workstation can be configured to implement the foregoingmedical endoscope image recognition method by using a medical endoscopevideo stream outputted by an endoscope as an input.

The technical solutions provided in the embodiments of this disclosuremay include the following beneficial effects:

For a given medical endoscope video stream, first original endoscopeimages are obtained therefrom, and then the obtained original endoscopeimage are filtered by using a neural network to obtain target endoscopeimages, to eliminate a large amount of interference existing under thecondition of switching and shaking during the photographing by theendoscope and encountering various liquids and foreign matters, so thatrobustness is enhanced.

After filtering the original endoscope images, corresponding organinformation is recognized from the generated target endoscope image, soas to identify an image type suitable for the target endoscope imagesaccording to the corresponding organ information by using aclassification network, and finally, in a photographing modecorresponding to the image type, according to a part indicated by theorgan information, a lesion region is localized and the lesion categorythereof is identified to implement the classification prediction for thewhole photographing process of the endoscope in the alimentary canal,and systematic and complete image recognition is implemented.

It is to be understood that the foregoing general descriptions and thefollowing detailed descriptions are only exemplary, and cannot belimited in the embodiments of this application.

BRIEF DESCRIPTION OF THE DRAWINGS

The accompanying drawings, which are incorporated herein and constitutea part of this specification, illustrate embodiments consistent withthis disclosure and, together with the specification, serve to explainthe principles of the embodiments of this disclosure.

FIG. 1 is a schematic diagram of an implementation environment accordingto an embodiment of this disclosure.

FIG. 2 is a block diagram of an apparatus according to an exemplaryembodiment.

FIG. 3 is a flowchart of a medical endoscope image recognition methodaccording to an exemplary embodiment.

FIG. 4 is a flowchart of step 330 according to the embodimentcorresponding to FIG. 3.

FIG. 5 is a flowchart of step 390 according to the embodimentcorresponding to FIG. 3.

FIG. 6 is a flowchart of step 393 according to the embodimentcorresponding to FIG. 5.

FIG. 7 is a flowchart of step 390 according to the embodimentcorresponding to FIG. 3.

FIG. 8 is a flowchart of step 390 according to the embodimentcorresponding to FIG. 7.

FIG. 9 is a flowchart of step 503 b according to the embodimentcorresponding to FIG. 3.

FIG. 10 is a flowchart of a step of training a neural network by usinglow-quality images and non-low-quality images captured by an alimentarycanal endoscope as samples, to obtain a neural network corresponding toa low-quality image category output probability and a non-low qualityimage category output probability according to an exemplary embodiment.

FIG. 11 is a schematic diagram of an overall framework of imagerecognition under photographing by an alimentary canal endoscopeaccording to an exemplary embodiment.

FIG. 12 is a schematic diagram of an endoscope image in a white lightphotographing mode according to an exemplary embodiment.

FIG. 13 is a schematic diagram of an endoscope image in an NBI modeaccording to the embodiment corresponding to FIG. 12.

FIG. 14 is a schematic diagram of an endoscope image in an iodine dyeingmode according to the embodiment corresponding to FIG. 12.

FIG. 15 is a block diagram of a medical endoscope image recognitionsystem according to an exemplary embodiment.

DESCRIPTION OF EMBODIMENTS

Exemplary embodiments are described in detail herein, and examples ofthe exemplary embodiments are shown in the accompanying drawings. Whenthe following descriptions are made with reference to the accompanyingdrawings, unless otherwise indicated, the same numbers in differentaccompanying drawings represent the same or similar elements. Theimplementations described in the following exemplary embodiments do notrepresent all implementations consistent with embodiments of thisdisclosure. On the contrary, the implementations are merely examples ofapparatuses and methods consistent with those are described in detail inthe appended claims and some aspects of the embodiments of thisdisclosure.

FIG. 1 is a schematic diagram of an implementation environment accordingto an embodiment of this disclosure. In an exemplary embodiment, theimplementation environment includes an endoscopic imaging systemincluding an endoscope 110, a display device 130, and a workstation 150.The endoscope 110 is used as a data source for image recognition; alongwith the movement and photographing of the endoscope 110 in alimentarycanals, the display device 130 continuously displays video images. Forexample, the images are displayed by using each endoscope image framecaptured by the endoscope 110.

On this basis, the image captured by the alimentary canal endoscope inthis embodiment of this disclosure is also recognized by using theworkstation 150, so as to examine the alimentary canal endoscope, toimplement systematic and comprehensive classification prediction,thereby obtaining lesion region distribution in the endoscope image anda category of the distributed lesion region.

The workstation 150 is a host deployed for the endoscope, such as amicro-computer with a large or small volume that only needs to meetperformance requirements.

Hence, this disclosure includes a physical medical device, such as anendoscopic imaging system, which at least includes: a display device fora medical endoscope video and a workstation, implementing the followingembodiments of the medical endoscope image recognition method by using amedical endoscope video stream outputted by an endoscope as an input.

Exemplarily, in the endoscopic imaging system, the medical endoscopevideo stream inputted to the workstation may be currently captured inreal time by the endoscope, and may also be obtained by photographing atany time, which is not limited herein.

In an exemplary embodiment, the endoscopic imaging system also includesan endoscope; the accessed endoscope provides a data source for theworkstation, and then the endoscope inputs the medical endoscope videoto the workstation, so as to implement real-time recognition of a videoimage.

FIG. 2 is a block diagram of an apparatus according to an exemplaryembodiment. For example, the apparatus 200 may be a workstation 150 inan implementation environment shown in FIG. 1. The workstation 150 maybe a micro-computer in any form as long as it meets the performancerequirements. For example, the workstation 150 may be a host connectedto an endoscope.

Referring to FIG. 2, the apparatus 200 includes at least the followingcomponents: a processing component 202, a memory 204, a power supplycomponent 206, a multimedia component 208, an audio component 210, asensor component 214, and a communication component 216.

The processing component 202 generally controls integral operations ofthe apparatus 200, such as operations related to displaying, a phonecall, data communication, a camera operation, and a record operation.The processing component 202 includes processing circuitry such as atleast one or more processors 218 to execute instructions, to implementall or some steps of the following method. In addition, the processingcomponent 202 includes at least one or more modules, to facilitate theinteraction between the processing component 202 and other components.For example, the processing component 202 may include a multimediamodule, to facilitate the interaction between the multimedia component208 and the processing component 202.

The memory 204 is configured to store various types of data to supportoperations on the apparatus 200. Examples of the types of data includeinstructions of any application program or method to be operated on theapparatus 200. The memory 204 is at least implemented by using avolatile or non-volatile storage device of any type or a combinationthereof, for example, a static random access memory (SRAM), anelectrically erasable programmable read-only memory (EEPROM), anerasable programmable read-only memory (EPROM), a programmable read-onlymemory (PROM), a read-only memory (ROM), a magnetic memory, a flashmemory, a disk, or an optical disc. The memory 204 further stores one ormore modules, and the one or more modules are configured to be executedby the one or more processor 218, to implement all or some steps of thefollowing method shown in any of FIG. 3, FIG. 4, FIG. 5, FIG. 6, FIG. 7,FIG. 8, FIG. 9, and FIG. 10.

The power supply component 206 provides power to various components ofthe apparatus 200. The power supply component 206 includes at least apower supply management system, one or more power supplies, and othercomponents associated with generating, managing and allocating power forthe apparatus 200.

The multimedia component 208 includes a screen providing an outputinterface between the apparatus 200 and a user. In some embodiments, thescreen may include a liquid crystal display (LCD) and a touch panel. Ifthe screen includes the touch panel, the screen may be implemented as atouchscreen to receive an input signal from the user. The touch panelincludes one or more touch sensors to sense a touch, a slide, and agesture on the touch panel. The touch sensor may not only sense theboundary of touching or sliding operations, but also detect duration andpressure related to the touching or sliding operations. The screenfurther includes an organic light emitting diode (OLED) display.

The audio component 210 is configured to output and/or input an audiosignal. For example, the audio component 210 includes a microphone(MIC). When the apparatus 200 is in an operating mode, such as a callmode, a record mode, and a speech recognition mode, the microphone isconfigured to receive an external audio signal. The received audiosignal may further be stored in the memory 204 or be sent by using thecommunication component 216. In some embodiments, the audio component210 further includes a speaker, configured to output an audio signal.

The sensor component 214 includes one or more sensors, configured toprovide status evaluation in each aspect to the apparatus 200. Forexample, the sensor component 214 detects a power-on/off state of theapparatus 200 and a relative location of a component. The sensorcomponent 214 further detects changes in a location of the apparatus 200or a component of the apparatus 200 and a temperature change of theapparatus 200. In some embodiments, the sensor component 214 furtherincludes a magnetic sensor, a pressure sensor or a temperature sensor.

The communication component 216 is configured to facilitatecommunication in a wired or wireless manner between the apparatus 200and other devices. The apparatus 200 accesses a communicationstandard-based wireless network, such as Wi-Fi. In an exemplaryembodiment, the communication component 216 receives, by using abroadcast channel, a broadcast signal or broadcast-related informationfrom an external broadcast management system. In an exemplaryembodiment, the communication component 216 further includes a nearfield communication (NFC) module to facilitate short-distancecommunication. For example, the NFC module may be implemented based on aradio frequency identification (RFID for short) technology, an InfraredData Association (IrDA for short) technology, an ultra wideband (UWB)technology, a Bluetooth technology, and another technology.

In an exemplary embodiment, the apparatus 200 is implemented by usingprocessing circuitry such as one or more application-specific integratedcircuits (ASICs), a digital signal processor, a digital signalprocessing device, a programmable logic device, a field-programmablegate array, a controller, a micro controller, a microprocessor, or otherelectronic elements, and is configured to perform the following method.

FIG. 3 is a flowchart of a medical endoscope image recognition methodaccording to an exemplary embodiment, taking the workstation executingthe method according to FIG. 1 as an example. In an exemplaryembodiment, the medical endoscope image recognition method, as shown inFIG. 3, can include at least the following steps.

In step 310, original endoscope images are obtained according to amedical endoscope video stream.

The medical endoscope video stream is a video stream captured by anendoscope in a medical environment, for example, a real hospital usageenvironment. During the movement and photographing of the endoscope, themedical endoscope video stream presents video endoscopy captured by alens of the endoscope. Hence, original endoscope image frames can beobtained according to the medical endoscope video stream. Since eachoriginal endoscope image frame describes the video endoscopy captured bythe endoscope at a time point, based on each original endoscope imageframe, the medical endoscope image can be recognized.

As can be understood, during the implemented medical endoscope imagerecognition, an endoscope captures the medical endoscope video stream inan organism, for example, a human body. Exemplarily, the endoscopecaptures the medical endoscope video stream in a tract communicated withthe outside or a sealed body cavity. For example, the indicated tractcommunicated with the outside may be an alimentary canal, a respiratorytract, a urinary tract, and the like; the sealed body cavity may be acavity body that needs an incision for the endoscope to be fed such aschest, an abdominal cavity, and a joint cavity. Capturing andrecognizing the medical endoscope video stream by using the endoscopecan obtain organ conditions in the corresponding tract.

In the process of using the endoscope to examine the tract, the obtainedmedical endoscope video stream is inputted to the workstation forrecognizing the medical endoscope image. Besides, the medical endoscopevideo stream obtained before, for example, a historical medicalendoscope video stream, may also be subjected to the medical endoscopeimage recognition. That is, recognition may be performed based on areal-time captured image, and may also be performed based on a largequantity of stored medical endoscope video streams. The medicalendoscope video stream obtained through real-time photographing by theendoscope in the alimentary canal is taken as an example for detailedexplanations below.

In the process of using the endoscope to examine the alimentary canal,the obtained medical endoscope video stream is inputted to theworkstation, and the corresponding original endoscope image is obtainedfrom the current alimentary canal image, for real-time recognition basedon the original endoscope image.

As can be understood, the current image displayed by the display device130 of the endoscopic imaging system is the alimentary canal image.Along with the movement and photographing of the endoscope in alimentarycanals, the display device 130 displays the video of alimentary canalimages through the inputted medical endoscope video stream, and at thistime, the original endoscope image required by image recognition isobtained from the current image. The original endoscope image is anoriginal image directly obtained under the photographing by theendoscope, and on this basis, the medical endoscope image is recognized.

In an exemplary embodiment, step 310 includes: obtaining the originalendoscope images from the inputted medical endoscope video stream alongwith movement and photographing of the endoscope in the tract or thesealed body cavity.

The tract communicated with the outside and the sealed tract do not onlyhave a single organ; taking the alimentary canal as an example, thealimentary canal includes a plurality of sub-organs, such as stomach,esophagus, pharynx and duodenum. During examination of the alimentarycanal, the endoscope moves and continuously takes photos in thealimentary canal so as to obtain a video stream related to thesub-organs. Accordingly, the original endoscope images constituting thevideo stream are images related to the sub-organ where the endoscope islocated, and indicate the state of the sub-organ.

It is explained that when examining the tract, with the movement andcontinuous photographing of the endoscope, the photographing of thesub-organ where the endoscope is located is not limited to a singleoriginal endoscope image; that is, a plurality of original endoscopeimages obtained all correspond to one sub-organ, and therefore, thesubsequent recognition of the medical endoscope image using the originalendoscope images as inputs actually relates to classification predictionof the sub-organ where the endoscope is located.

In another exemplary embodiment, the inputted medical endoscope videostream is not obtained through real-time photographing. For examplerecognition of the medical endoscope image according to this embodimentof this disclosure is conducted based on the stored medical endoscopevideo streams. In this scenario, step 310 can include:

obtaining the stored medical endoscope video stream; and

obtaining the original endoscope images from the medical endoscope videostream, the original endoscope image being used for recognizing a lesionregion in a tract or a sealed body cavity photographed by an endoscopeand identifying a lesion category of the lesion region.

The stored medical endoscope video stream is recognized one by one usingthe medical endoscope image recognition method provided by theembodiments of this disclosure, so as to recognize the lesion region andlesion category of the organ part related to the original endoscopeimage, to implement the processing of a large quantity of historicalmedical endoscope video streams.

In this exemplary embodiment, the medical endoscope video stream is nolonger obtained through the output of the endoscope, but the storedmedical endoscope video streams are obtained, so as to obtain theoriginal endoscope image therefrom.

Through the implementation of this exemplary embodiment, a largequantity of stored medical endoscope video streams can also berecognized so as to facilitate medical research, and provide automaticvideo image recognition for the real medical environment.

In step 330, the original endoscope images are filtered by using aneural network, to generate target endoscope images.

First, it is to be explained that, all the original endoscope imagesobtained from the medical endoscope video stream need to be filtered tofilter out the interference in the images. As can be understood, not allof the large quantity of original endoscope images obtained from themedical endoscope video stream can be used for the recognition of themedical endoscope images, some of which cannot be used for therecognition due to influences caused by various factors duringphotographing; these images would exist as interference and thus need tobe filtered out, such as the original endoscope images obtained byphotographing during switching, shaking, or switching and shaking of theendoscope and the original endoscope images obtained when a lensencounters various liquids and foreign matters during photographing.These original endoscope images are low-quality images, exist asinterferences for the recognition, and need to be recognized andfiltered out by using the neural network.

For example, whether the obtained original endoscope images are thelow-quality images is recognized by using the neural network and theoriginal endoscope images that are the low-quality images are filteredout. Accordingly, the used neural network is obtained through trainingby using the low-quality images as examples.

In the process of actual usage of the endoscope, since the endoscopewould unavoidably be switched and shaken in the alimentary canal and aphotographing lens would also unavoidably encounter various liquids andforeign matters, the original endoscope images obtained throughphotographing include a large quantity of low-quality and noisy images.Recognizing and filtering the low-quality images from the originalendoscope images obtained in step 310 by using the neural networkshields the influence on image recognition by the low-quality images,greatly improving the robustness. In addition, useless and unnecessaryimages are filtered out in advance through recognizing and filtering thelow-quality images, so that computing resources consumed by executingthe follow-up steps are reduced, and therefore, speed and real-timeperformance can be effectively improved.

The low-quality image recognition performed on the original endoscopeimages is implemented by using the trained neural network. Moreover,this neural network is trained according to endoscope image samples thatare the low-quality images and endoscope image samples that are thenon-low-quality images. The trained neural network can output, for theinputted original endoscope image, a probability of being thelow-quality image and a probability of being the non-low-quality image,so as to finally determine whether the original endoscope image is thelow-quality image or the non-low-quality image; the original endoscopeimage determined to be the low-quality image is filtered out and nofollow-up steps are adopted for processing.

Recognition of the low-quality images for the obtained multiple originalendoscope images is conducted by using the neural network, to filter outthe low-quality images included in the obtained multiple originalendoscope images, complete filtering of the low-quality images, andgenerate target endoscope images, so as to recognize organ parts thatthe endoscope enters.

In an exemplary embodiment, the original endoscope images inputted intothe neural network for low-quality image recognition are necessarilyadapted to the neural network to ensure consistency and accuracy ofactual prediction. Hence, before predicting the low-quality images,pre-processing the original endoscope images is further required, forexample, size adjusting, cutting, and the like, to obtain the originalendoscope images with the size adapted to the neural network.

The target endoscope images are the remaining original endoscope imagesafter eliminating the low-quality images in the original video images.At this point, by filtering the original endoscope images, the generatedtarget endoscope images can shield interference, reduce a data amount,and also enhance the accuracy of recognition.

It is to understood that, when filtering the original endoscope images,for the low-quality images and the non-low-quality images for trainingthe neural network, two major categories of the original endoscopeimages are relative, regarding required different filtering precision,the same original endoscope image may be a low-quality image and mayalso be a non-low-quality image.

In step 350, organ information corresponding to the target endoscopeimages is recognized by using the neural network.

As the movement and continuously photographing in the tract by theendoscope, the endoscope is located in the tract, for example, on acertain sub-organ in the alimentary canal, however, the endoscope wouldnot output the organ part where it is located, the organ part is oftenrequired to be recognized by checking using the endoscope image in amanual way, so as to facilitate the implementation of accurateclassification prediction of the endoscope image for the organ partwhere the endoscope is located.

Moreover, in the recognition implemented by this exemplary embodiment,for the target endoscope image generated through filtering thelow-quality image, the alimentary canal part where the endoscope iscurrently located is recognized. For example recognition is performed toobtain the organ information corresponding to the target endoscopeimage, and the organ information indicates the organ part in the tractwhere the endoscope is located when capturing the target endoscopeimage.

In an exemplary embodiment, recognizing the organ information of thetarget endoscope image is also implemented by using the constructedneural network; the target endoscope image is used as an input, andinformation of an organ where the endoscope is located when capturingthe target endoscope image is outputted.

For example, using the alimentary canal as an example, the constructedneural network may by a four-class network. To be adapted to thesub-organs on the alimentary canal, such as stomach, esophagus, pharynxand duodenum, the four-class network is pre-constructed, to recognizethe target endoscope image so as to recognize an organ location wherethe endoscope is located.

Accordingly, the four-class network is obtained by training using theoriginal endoscope image in which corresponding alimentary canal parts,that is the sub-organs on the alimentary canal, are annotated. Thefour-class network adapted to the alimentary canal parts executes therecognition of the alimentary canal part where the endoscope iscurrently located, the endoscope images for network training, such assamples of the four-class network that cover all the alimentary canalparts, and therefore, it is no longer limited to the recognition of asingle sub-organ, thereby enhancing the recognition performance of animage captured by the endoscope on the alimentary canal.

In step 370, an image type suitable for the target endoscope images isidentified according to the corresponding organ information by using aclassification network.

Through executing the steps above, after localizing the organ part andrecognizing the organ information on the target endoscope image,according to the organ information, the photographing mode for thetarget endoscope image can be switched.

The image type to which the target endoscope image is adapted is animage type that can best enhance the image endoscopy in the targetendoscope image. Through identifying the image type, the most properphotographing mode can be determined for the target endoscope image. Forexample, based on the image type, the photographing mode correspondingto the image type can be switched.

Exemplarily, the recognition of the image type to which the targetendoscope image is adapted is implemented by using the classificationnetwork. In an exemplary embodiment, corresponding to image typedivision, the classification network may be a three-class network, thatis a classification network that divides images into three image types,so as to implement the recognition of the target endoscope image for thethree image types.

It is to be understood that different photographing modes correspond todifferent image types, and therefore, when the photographing modes areset as three, such as white light, Narrow Band Imaging (NBI), and iodinedyeing modes, three corresponding image types exist. Hence, thephotographing mode to which an image content in the target endoscopeimage is adapted can be determined through the recognition of the imagetype, that is the photographing mode corresponding to the image type canbe identified.

For example, according to the alimentary canal part indicated in theorgan information and the image content of a suspicious lesion ordisease lesion region in the target endoscope image, through therecognition of the neural network, it is obtained that the targetendoscope image is the image type corresponding to the NBI, and theimage type corresponds to the NBI photographing mode.

In step 390, a lesion region in each of the target endoscope images islocalized according to a part indicated by the organ information, and alesion category of the lesion region in the photographing modecorresponding to the image type is identified.

During the execution of the preceding steps, after the sub-organ wherethe endoscope is located when photographing the target endoscope imageis known, for example the organ part where it is located is determined,the target endoscope image in the photographing mode to which the targetendoscope image corresponding to the sub-organ can be obtained, so as toimplement the localization of the lesion region and the recognition ofthe lesion category of the lesion region.

It is to be explained that the organ part indicated by the organinformation corresponds to multiple target endoscope images, andtherefore, the target endoscope image adapted to the photographing modecan be obtained from the multiple target endoscope images obtained byphotographing on the organ part, to localize the lesion region on thetarget endoscope image and identify the lesion category of the lesionregion for the organ part.

The photographing mode aims at the target endoscope image that imagesthe organ part. Exemplarily, the photographing mode includes the imagetype, dyeing type, etc. For example, the photographing mode includesthree modes of white light, NBI, and iodine dyeing. For imaging theorgan part, different lesion conditions on the target endoscope imageare adapted to different photographing modes.

For example, normally, the white light mode is adopted, and when asuspicious lesion or a disease lesion region exists on the organ part,the white light mode is switched to the NBI mode. Since the imagecolors, textures, and details corresponding to the target endoscopeimages in different photographing modes are greatly different, throughswitching the photographing mode, the lesion region can be moreaccurately localized, so as to identify the lesion category of thelesion region.

After recognizing the target endoscope image to obtain the image type towhich the target endoscope image is adapted, according to the identifiedimage type, the corresponding photographing mode is determined, so as todirectly switch the target endoscope image into the determinedphotographing mode, thereby obtaining the target endoscope image in thephotographing mode to which the organ part where the endoscope iscurrently located is adapted, so as to enhance the accuracy of the imagecontent represented by the target endoscope image. Through the exemplaryembodiment, dynamic adjustment is performed on the photographing modefor the target endoscope image, so as to enhance the accuracy rate ofimage recognition.

In the photographing mode to which the organ part where the endoscope islocated is adapted, localizing the lesion region of the target endoscopeimage and identifying the lesion category thereof can greatly improvethe system performance and accuracy rate of the recognition result.

Through the execution of step 350, the organ information is obtained;the organ information corresponds to the target endoscope image obtainedby filtering out the low-quality image; based on the organ information,localizing the lesion region of the target endoscope image andidentifying the lesion category thereof can be performed in the adaptedmode. The target endoscope image obtained by filtering out thelow-quality image corresponding to the organ information can have thefollowing two meanings: on one hand, the target endoscope image obtainedby filtering out the low-quality image has been adapted to thephotographing mode suitable for the current alimentary canal part; forexample, the adapted photographing mode is the white light mode, whilethe target endoscope image obtained by filtering out the low-qualityimage corresponds to the white light mode, which is consistent with thephotographing mode that needs to be used. On the other hand, the targetendoscope image obtained by filtering out the low-quality image has aphotographing mode that is not adapted to the photographing modesuitable for the organ part, for example, the photographing mode used bythe endoscope image is the white light mode, while the photographingmode needing to be used is the NBI mode. Hence, it is required to switchthe photographing mode of the target endoscope image.

Exemplarily, the executed lesion region localization and lesion categoryidentification are both implemented by using a deep learning network.The lesion region localization may adopt a localization detectionnetwork, for example, and end-to-end real-time target location networkYOLO (You Only Look Once, a deep learning network for target detection),and may also adopt other detection networks (e.g., FasterRCNN); thelesion category identification is implemented using the classificationnetwork; this classification network may be a Densely ConnectedConvolutional network (DenseNet for short).

It is to be further explained that a localization detection networkdeployed for the lesion region localization may be deployed uniformly.For example different organs use the same localization detectionnetwork, and the localization detection network may be separatelydeployed according to corresponding organ information, such as thealimentary canal parts. Moreover, the classification network deployedfor the lesion category identification is also like this, and isdetermined according to experimental effects. If the network isseparately deployed according to the alimentary canal parts, there isonly a need to train the deep learning network separately.

Through the exemplary embodiment as stated above, a more complete andavailable medical endoscope image recognition system with strongrobustness can be implemented, so as to comprehensively assist a doctorin diagnosis in many respects and improve diagnosis efficiency.Localizing the lesion region of the target endoscope image andidentifying the lesion category thereof effectively can help avoidmissed diagnosis of the alimentary canal examination by the endoscope,effectively assist the doctor to determine the lesion property in realtime, and improve the accuracy rate of the determination.

In addition, using the neural network to filter the low-quality imagefor the endoscope image effectively can improve a noise proof capabilityand also improve the system availability.

The medical endoscope image recognition according to the exemplaryembodiment above is implemented by means of deep learning, manualintervention is no longer needed for profound understanding of themedical image, and a manually made feature extraction solution is nolonger needed as well, thereby avoiding omission and erroneous judgmentcaused by incomplete feature extraction.

FIG. 4 is a flowchart of step 330 according to the embodimentcorresponding to FIG. 3. In an exemplary embodiment, as shown in FIG. 4,step 330 includes at least the following steps.

In step 331, the original endoscope images are processed according to aset size to generate standard endoscope images.

The standard endoscope image is adapted to the endoscope image with thesize required by the neural network for the neural network to beinputted. For recognition and filtering of the low-quality image facingthe original endoscope image, first, it is required to pre-process thedata, such as adjusting the size of the obtained original endoscopeimage, so that the generated standard endoscope image can be adapted tothe input to the neural network and the consistency is ensured.

For example, according to the set size, the process of processing theoriginal endoscope images includes: first executing a resize operation,and then using an image scaling method, such as a center crop method (arounded corner cutting method) to cut, to obtain the standard endoscopeimage with the set size.

The resize operation is an adjusting operation for the originalendoscope image; exemplarily, the execution process of the resizeoperation may be: maintaining a length-width ratio, scaling a short edgeto 224 pixels, and a long edge being greater than or equal to 224pixels. The execution process of the center crop method may be: using along edge of the original endoscope image as a standard, and cutting acentral region of the original endoscope image, so that the long edgebecomes 224, so as to obtain a standard endoscope image conforming tothe set size, to ensure the consistency of network prediction.

In step 333, prediction of whether the standard endoscope images arelow-quality images or non-low-quality images is performed by using theneural network. The low-quality image is a standard endoscope imagehaving interference for example.

In the real hospital usage environment, there can be many types oflow-quality images, including vague, abnormally colored, and/orover-exposed unqualified images. Based on the unqualified images, theneural network is used for implementing a classification task, so as tofilter the low-quality images of the standard endoscope images.Exemplarily, the neural network may be a deep convolutional neuralnetwork, such as Densenet.

Taking the standard endoscope image processed to the set size as aninput, prediction of the low-quality image and non-low-quality image isperformed by using the trained neural network, so as to output, by theneural network, a probability for the standard endoscope image to be thelow-quality image and a probability for the standard endoscope image tobe the non-low-quality image, and finally determine whether the standardendoscope image is the low-quality image or the non-low-quality image,so as to obtain the target endoscope images. In this exemplaryembodiment, accordingly, the target endoscope image is an endoscopeimage adapted to the neural network and subjected to the size processingon the original endoscope image.

The trained neural network is constituted by executing the networktraining process after a large quantity of original endoscope images aredivided into the low-quality images and the non-low-quality images. Inan exemplary embodiment, the large quantity of original endoscope imagesas examples can be obtained by expanding the original endoscope images,so as to provide more samples for the training of the neural network.

In step 335, the standard endoscope images that are low-quality imagesare filtered out to obtain the target endoscope images.

After the original endoscope image obtained from the medical endoscopevideo stream is processed and predicted based on the steps above, theendoscope image corresponding to the low-quality image in the originalendoscope images can be determined. In this case, the original endoscopeimage that is the low-quality image can be filtered out, which caneffectively prevent useless and unnecessary images from entering thefollow-up recognition process of the medical endoscope image.

Through this exemplary embodiment as stated above, recognition andfiltering of the low-quality image are implemented for the medicalendoscope image recognition, so as to be actually applied to a realproduction environment, such as a hospital, without influencing from theswitching and shaking of the endoscope in the tract, and also withoutthe influences from various liquids and foreign matters encountered inthe tract by the endoscope.

FIG. 5 is a flowchart of step 390 according to the embodimentcorresponding to FIG. 3. In an exemplary embodiment, as shown in FIG. 5,step 390 includes the following steps.

In step 391, a foreign matter in each target endoscope image in thephotographing mode corresponding to the image type is detected, toobtain a foreign matter frame distributed in the each target endoscopeimage, the foreign matter frame being used for indicating a regionhaving a foreign matter in the each target endoscope image.

In step 393, the target endoscope images are filtered according to theforeign matter frame, the lesion region is localized by using the targetendoscope images that remain after the filtering, and the lesioncategory of the lesion region is identified.

For the target endoscope image in the photographing mode correspondingto the adapted image type, before localizing the lesion region andidentifying the lesion category, the foreign matters in the targetendoscope image are further detected and localized, so as to filter outthe foreign matters that influence the image content in the targetendoscope image.

It is to be understood, taking an alimentary canal as an example, in thealimentary canal, there are often special intraoperative instruments,saliva and other foreign matters in the esophagus and stomach. Moreover,the image content of the target endoscope image captured by theendoscope in the alimentary canal can mostly contain intraoperativeinstruments, saliva and other foreign matters. Hence, the targetendoscope image in which the foreign matter is detected cannot bedirectly filtered out.

In this case, it is necessary to estimate whether the existing foreignmatter would interfere with the follow-up lesion region localization ofthe target endoscope image according to the distribution of the foreignmatter in the target endoscope image; filtering out the target endoscopeimages with high foreign matter interference improves the noise proofcapability of the system and enhances availability of image recognition.

For example, the detection of the foreign matter faces the targetendoscope image adapted to the photographing mode; for this targetendoscope image, the neural network is used for detecting the foreignmatter in the image, and obtaining a foreign matter frame localized onthe target endoscope image.

The foreign matter frame is used for indicating a region occupied by theforeign matter in the target endoscope image. It is to be understoodthat the foreign matter frame annotates the distribution of the foreignmatter in the target endoscope image. The foreign matter frame issubstantially a region occupied by the intraoperative instruments or aregion occupied by the saliva.

Through foreign matter detection, the obtained foreign matter framedistributed on the target endoscope image is represented in the form ofcoordinates. This process implements the detection of the target byusing the neural network; and under the action of the neural network, inaddition to outputting the coordinates representing the foreign matterframe, a confidence of the foreign matter frame corresponding to theforeign matter is further outputted, such as the probability.

Exemplarily, for a foreign matter, if the foreign matter framecorresponding to the foreign matter is a square frame, the coordinatesof the foreign matter may be determined by the square frame, and mayinclude four pieces of coordinate information, such as x min, y min, xmax, and y max.

After performing the foreign matter detection on the target endoscopeimage to obtain the foreign matter frame distributed in the targetendoscope image, the foreign matter frame distributed in the targetendoscope image can be used for evaluating whether to filter out thetarget endoscope image to shield the interference caused by excessforeign matters.

In an exemplary embodiment, step 391 includes: inputting the each targetendoscope image in the photographing mode corresponding to the imagetype into the neural network, performing target detection by using theneural network, and outputting coordinates and a confidence thatcorrespond to the foreign matter frame, where the coordinates are usedfor indicating a distribution of the foreign matter frame in the eachtarget endoscope image.

The neural network for foreign matter detection may be a YOLO locationnetwork, and may also be another deep detection network, which is notlimited herein. By using the deployed neural network, the entire targetendoscope image is used as an input, the location of the foreign matterframe is regressed at an output layer, such as the coordinates and thecategory thereof, and this category is the foreign matter. That is, theconfidence outputted by the neural network represents the possibilitythat the foreign matter corresponds to the localized foreign matterframe.

Further, in an exemplary embodiment, FIG. 6 is a flowchart of step 393according to the embodiment corresponding to FIG. 5. Step 393, as shownin FIG. 6, includes the following steps.

In step 401, an area proportion factor of an area occupied by theforeign matter in each target endoscope image is determined according tothe coordinates and the confidence corresponding to the foreign matterframe in the each target endoscope image.

After obtaining the coordinates and confidence corresponding to theforeign matter frame in the target endoscope image through foreignmatter detection, according to the coordinates and the confidence, thearea proportion factor of all foreign matters on the target endoscopeimage is calculated.

Exemplarily, a foreign matter frame area Si is first calculatedaccording to the coordinates. The foreign matter frame area is an areaoccupied by the foreign matter frame. Then a corresponding confidence Piis used as a coefficient to correct the foreign matter frame area, suchas PiSi, and finally corrected foreign matter frame areas of all theforeign matter frames are added, such as by summing the area PiSi ofeach foreign matter frame, and performing a proportion calculation onthe sum and a total area of the target endoscope image, to finallyobtain the area proportion factor of the area occupied by the foreignmatter in the target endoscope image.

In an exemplary embodiment, the area proportion factor corresponding tothe target endoscope image may be calculated through the followingformula:

$f = \frac{\sum{P_{i}S_{i}}}{HW}$

where f is the area proportion factor; H is the height of the targetendoscope image; W is the width of the target endoscope image; i is anidentifier of the foreign matter frame; the value of i is greater thanor equal to 1; P_(i) is the confidence of the ith foreign matter frame,that is P_(i)=confidence, S_(i) is the area of the ith foreign matterframe, S_(i)=(x_(maxi)−x_(mini))*(y_(maxi)−y_(mini)).

In step 403, interference of the foreign matter with the each targetendoscope image is determined according to the area proportion factor,and filter out the target endoscope images with foreign matterinterference.

After obtaining the area proportion factor of the area occupied by theforeign matter in the target endoscope image through calculation, theinterference of the foreign matter with the each target endoscope imagecan be determined according to the numeral value of the area proportionfactor. As can be understood, the greater the numeral value of the areaproportion factor is, the interference with the target endoscope imageis greater; the smaller the numeral value of the area proportion factoris, the interference with the target endoscope image is tinier, and theless influence is caused to the following lesion region localization andcategory identification of the target endoscope image.

Hence, target endoscope images with relatively larger area proportionfactors are filtered out, and these images are considered to be thetarget endoscope images with the foreign matter interference.

In an exemplary embodiment, a threshold f₀ is first set, and the defaultvalue of f₀ may be 0.1. When f is greater than the threshold f₀, it isdetermined that the target endoscope image is the target endoscope imagewith foreign matter interference, and the target endoscope image needsto be filtered out.

When f is smaller than the threshold f₀, lesion region localization forthe target endoscope image and lesion category identification for thelocalized lesion region are continued.

Through this exemplary embodiment, foreign matter localization andanti-inference are implemented, so as to resolve the special foreignmatter problems such as intraoperative instruments and saliva in theesophagus and stomach in the alimentary canal, thereby reducing theinfluence on image recognition due to the presence of the foreignmatter.

FIG. 7 is a flowchart of step 390 according to the embodimentcorresponding to FIG. 3. In an exemplary embodiment, as shown in FIG. 6,step 390 at least includes the following steps.

In step 501 a, an image type of the target endoscope image is detected.

In step 503 a, a photographing mode corresponding to the targetendoscope image is switched according to the photographing modecorresponding to the identified image type when the image type isinconsistent with the identified image type, to obtain a targetendoscope image in the photographing mode corresponding to the imagetype.

After obtaining the image type suitable for the target endoscope imagesthrough recognition, according to the image type of the target endoscopeimage, whether to switch the photographing mode can be estimated, toensure that the photographing mode of the target endoscope image issuitable.

For example, only when the image type of the target endoscope image isinconsistent with the image type obtained through recognition, thephotographing mode of the target endoscope image is switched to obtainthe target endoscope image in the photographing mode corresponding tothe image type suitable for the target endoscope images.

FIG. 8 is a flowchart of step 390 according to the embodimentcorresponding to FIG. 3. In an exemplary embodiment, as shown in FIG. 8,step 390 includes the following steps.

In step 501 b, continuous feature extraction for each target endoscopeimage in the photographing mode corresponding to the image type isperformed by using each layer of a localization detection network untilthe lesion region in the each target endoscope image is finally obtainedthrough regression.

In step 503 b, a lesion property of the lesion region in the each targetendoscope image is classified by using the classification network, toobtain the lesion category of the lesion region.

The localization detection network is used for performing targetdetection on the target endoscope image, to implement the lesionlocalization in the target endoscope image, so as to outputtwo-dimensional coordinates of the lesion region. Exemplarily, thelocalization detection network is an end-to-end real-time targetdetection algorithm, such as YOLO, to meet real-time requirements forimage recognition. The localization detection network may also use otherdetection networks for replacement, such as FasterRCNN.

A process of performing continuous feature extraction by using eachlayer of the localization detection network to obtain the lesion regionin the each target endoscope image through regression obtains morefeatures and is more comprehensive, and thus can avoid incompletefeature extraction and omission and misjudgment caused thereby.

In an exemplary embodiment, the lesion region obtained upon locationdetection would be represented in the form of two-dimensionalcoordinates. The localization detection network finally outputs thetwo-dimensional coordinates for localizing the lesion region on thetarget endoscope image.

For example, the localization problem of the lesion region by the YOLOrelates to the extraction image bounding boxes and category probabilityregression problems. At this point, through the continuous featureextraction in each layer of the network, the two-dimensional coordinatesand probability are finally obtained by regression, and therefore, theaccuracy of localization is improved while ensuring the real-timeperformance of the detection.

For the localization detection network for implementing the lesionregion localization, in an exemplary embodiment, network training isperformed using an image data set with an open source, so as to obtainparameters and weight values of each network layer, for example, theparameters and weight values of the convolution layer can be obtained,so as to construct a localization detection network having moregeneralization performance. The data amount of the image data set withan open source is above a million; training the localization detectionnetwork using the image data set with an open source may avoidoverfitting, so that the network training can be better converged to anoptimal point.

In addition, the low-quality image is also added into the trainingprocess of the localization detection network. For example, based on thelow-quality endoscope image, training the localization detectionnetwork, to enhance the robustness and anti-noise capability of thelocalization detection network and to reduce a false positive ratio.

After localizing the lesion region, the recognition of the lesioncategory of the lesion region can be executed. Exemplarily, the categorymay include normal, precancerous disease lesion, early cancer, advancedcancer, inflammatory disease lesion, and other disease lesions, whichare not limited herein.

In an exemplary embodiment, the classification network implementing thelesion category identification may be based on Densenet. The input ofthe classification network is a lesion region in the target endoscopeimage, and the output thereof is the lesion category corresponding tothe lesion region.

At this point, the lesion region localization and lesion regionrecognition can implement a more complete and available imagerecognition solution, without being limited to a single function,ensuring the comprehensiveness of the supported functions.

FIG. 9 is a flowchart of step 503 b according to the embodimentcorresponding to FIG. 8. In an exemplary embodiment, as shown in FIG.10, step 503 b includes the following steps.

In step 601, the lesion region in the each target endoscope image isextended, to obtain an extended region corresponding to the lesionregion.

As can be understood, the localized lesion regions in the targetendoscope image are not consistent with each other in size. For example,for each lesion region, the lesion region is first extended so as toobtain an extended region corresponding to each lesion region subjectedto the lesion category identification.

The external expansion of the region can ensure that the lesion regionfor recognition can obtain certain context semantic information.Features related to lesions often exist around the lesion region. Forexample, the lesion is not strictly provided with a boundary, and thelesion is a gradually changed process. Therefore, the external expansionof the region can provide more information to the classification networkto learn, so as to avoid missing useful boundary information.

In an exemplary embodiment, the external expansion of the lesion regionis a process of setting the proportions of up, down, left, and rightexternal expansion of this lesion region. For example, the lesion regionis extended by 10% upwards, downwards, leftwards, and rightwards.

In step 603, the extended region is pre-processed to normalize theextended region into a classification network input image meeting aninput size.

The extended region is pre-processed so that the extended region isnormalized as an image with the input size, so as to ensure the input ofthe classification network may be met.

In an exemplary embodiment, the pre-processing process includes anexecution process using a center crop method. Moreover, theclassification network training process corresponding thereto requiresto implement the pre-processing of the classification network inputimage through a data enhancing method, so as to expand the samples.

In step 605, network prediction on a lesion category of the input imageis performed by using the classification network, to obtain the lesioncategory of the corresponding lesion region in the each target endoscopeimage.

After obtaining the image of the extended region including the contextinformation through the preceding steps, the image is inputted into theclassification network, so that network prediction of the lesioncategory can be performed on the corresponding lesion region; in asimilar fashion, the lesion category of the lesion region in theendoscope image can be identified.

Exemplarily, the classification network for implementing categoryidentification may be a Densenet model. The lesion categories outputtedby the classification network may be six categories, such as normal,precancerous disease lesion, early cancer, advanced cancer, inflammatorydisease lesion, and other disease lesions. In this case, theclassification network is actually a six-class network.

Identifying the lesion category of the lesion region in the endoscopeimage can output in real time the specific property of the lesion of thealimentary canal through image recognition in the alimentary canaldetection process of the endoscope, so as to assist the doctor in thealimentary canal endoscope image diagnosis.

In an exemplary embodiment, a medical endoscope image recognition methodfurther includes training a neural network by using low-quality imagesand non-low-quality images captured by an endoscope as samples, toobtain a neural network corresponding to a low-quality image categoryoutput probability and a non-low quality image category outputprobability. The neural network is used for generating target endoscopeimages. Corresponding to the preceding description, the endoscope imagemay be the original endoscope image, and may be a standard endoscopeimage matched with the neural network size processing, which is notlimited herein.

As described above, for the original endoscope image obtained from themedical endoscope video stream, recognizing, by using the trained neuralnetwork, whether it is a low-quality image, so as to filter out theendoscope image corresponding to the low-quality image, avoids theoccurrence of noise that is useless and may influence the processingefficiency.

FIG. 10 is a flowchart of a step of training a neural network by usinglow-quality images and non-low-quality images captured by an endoscopeas samples, to obtain a neural network corresponding to a low-qualityimage category output probability and a non-low quality image categoryoutput probability according to an exemplary embodiment. In an exemplaryembodiment, as shown in FIG. 10, the steps include the following steps.

In step 801, the low-quality images and the non-low-quality imagescaptured by the endoscope as the samples are adjusted to a fixed size.

In step 803, data enhancement is performed on the low-quality images andthe non-low-quality images that have been adjusted to the fixed size, toobtain sample images meeting an input size of a neural network.

In step 805, a network training process for the neural network isexecuted by using the sample images as inputs.

First it is to be explained, the sample of the neural network forrecognizing the low-quality image includes the low-quality image andnon-low-quality image captured by the endoscope, but is not limited tothe low-quality image and non-low-quality image captured by theendoscope, and further includes images expanded from the low-qualityimage and non-low-quality image captured, so as to form the sample imageinputted to the neural network.

The low-quality image and non-low-quality image captured by theendoscope are not obtained by one endoscope examination but areendoscope images widely obtained through various modes.

For the network training process, parameters and weight values of eachnetwork layer are obtained through a large scale of sample inputs, andthe data amount of the sample also determines the generalizationperformance and classification accuracy of the trained neural network.Hence, for the low-quality image and non-low-quality image captured bythe endoscope, while performing size adjustment according to the inputrequirements of the neural network, it is also needed to continuouslyexpand the data amount of the sample, such as by performing dataenhancement on the image upon completed size adjustment, to obtain moresample images.

The executed fixed-size adjustment is the process of adjusting the imageto a fixed size, for example, the process of adjusting the image to227*227 pixels. Moreover, data enhancement is data pre-processing usinga random cutting method, etc. and combining a series of operations suchas random rotation, brightness, color, contrast, and random jitter toperform fixed-size adjustment and data enhancement on the low-qualityimages and the non-low-quality images, so as to obtain various images,for example, images at different angles, to enhance the generalizationperformance and prevent the occurrence of the overfitting phenomenon.

Data enhancement is performed on each of the low-quality images and thenon-low-quality images, to change one image into multiple images, so asto together form sample images meeting the input size of the neuralnetwork.

Through the exemplary embodiment, the sample data can be expanded forthe network training process; on the basis of the existing low-qualityimage and non-low-quality image captured by the endoscope, sufficientsample data may be provided to implement the network training process ofconverging to the optimal point.

Through the exemplary embodiment as stated above, real-time imagerecognition of each original endoscope image is performed for theendoscope photographing, and it can implement the accurate and rapidcapture of the lesion in the endoscope examination while having thereal-time performance.

Now at the angle of executing the alimentary canal endoscopeexamination, it is elaborated by combining the method implementationabove.

During the process for a doctor to use the endoscope to examine thealimentary canal, the video stream is inputted, such as the medicalendoscope video stream of the alimentary canal, while executing thecurrent image display synchronously, the original endoscope image iscorrespondingly obtained.

At this time, a series of processes of low-quality image recognition andfiltering, alimentary canal part localization, lesion regionlocalization, and category identification would be performed on theoriginal endoscope image, and therefore, real-time and accurateassistance is continuously provided in the endoscope examination, torapidly provide complete and accurate processing on the generation of alarge quantity of original endoscope images during the alimentary canalphotographing process of the endoscope, so that the generation of thelarge quantity of medical images would no longer be a bottleneck of thealimentary canal endoscope examination.

For example, FIG. 11 is a schematic diagram of an overall framework ofimage recognition photographed by the alimentary canal endoscopeaccording to an exemplary embodiment. In an application of an exemplaryembodiment, as shown in FIG. 11, in the process of photographing thealimentary canal using the endoscope, along with the movement andphotographing of the endoscope in the alimentary canal, the medicalendoscope video stream of the alimentary canal is outputted.

For the medical endoscope video stream of the alimentary canal, step 910is first executed to perform recognition and filtering on thelow-quality image on each original endoscope image to remove theoriginal endoscope image belonging to the low-quality image to generatethe target endoscope image.

At this point, it is to be further indicated that for the recognitionand filtering of the low-quality image as the classification task,Densenet can be selected to construct the neural network to be used, andin the execution process of the neural network, the sample is processedthrough the data enhancing method, but for the network predictionprocess, the data enhancing method is no longer executed, only a singlecutting method, for example, the center crop method, is used forensuring consistency, so as to avoid the increase of time consumptioncaused by the data enhancement, ensuring the real-time performance.

Filtering the low-quality image through step 910 can effectively removethe low-quality image in the original endoscope image so that thenon-low-quality image can execute the following image recognitionprocess.

For the target endoscope image, step 920 is executed to recognize theorgan part. Moreover, for the organ part recognition as theclassification task, the Densenet can also be selected to construct theneural network to be used, such as the preceding indicated four-classnetwork.

Through recognizing the organ part of the endoscope image, the organpart where the endoscope is currently located in the alimentary canalcan be localized in the process of continuous movement and photographingof the endoscope, so as to provide a proper available photographing modefor the endoscope that photographs the organ part.

Different photographing modes correspond to different image types, andtherefore, the image type identification in step 930 is substantiallythe identification of the photographing mode suitable for the endoscopeimage. After recognizing to obtain the image type that is to be set forthe endoscope image, the photographing mode of the endoscope image canbe switched according to this image type, thereby obtaining thephotographing mode suitable for the endoscope image for each endoscopeimage obtained by filtering out the low-quality image.

For example, FIG. 12 is a schematic diagram of an endoscope image in awhite light photographing mode according to an exemplary embodiment.FIG. 13 is a schematic diagram of an endoscope image in an NBI modeaccording to the embodiment corresponding to FIG. 12. FIG. 14 is aschematic diagram of an endoscope image in an iodine dyeing modeaccording to the embodiment corresponding to FIG. 12.

As can be seen from FIG. 12 to FIG. 14, the image colors, textures, anddetails of the three images are greatly different, and therefore, in theprocess of recognition, adaptively switching the photographing mode uponrecognition of the image type greatly enhances the accuracy of imagerecognition.

In step 930, the image type is identified. The step also needs toimplement the classification task. Hence, the Densenet model can also beselected to construct the classification network to be used, such as thethree-class network, and therefore, the network training process thereofis similar to the training process of the low-quality image filteringnetwork.

After obtaining the target endoscope image in the photographing modesuitable for the alimentary canal part where the endoscope is currentlylocated after completing the image type identification, the foreignmatter localization and anti-interference implementing processes in step940 are executed, to eliminate the interference of the foreign matter,so as to complete the lesion region localization and the lesion categoryidentification. Through the execution process stated above, an endoscopeimage frame would be processed in an average of 150 milliseconds, whichmeets the real-time performance requirement, and has very high accuracy.The implementation of the execution process can be deployed to ahospital, so as to assist a doctor in diagnosis of the alimentary canalendoscope image in real-time, improving the diagnosis efficiency of thedoctor.

Based on the execution process stated above, a more complete andavailable system with strong robustness for assisting the alimentarycanal endoscope examination can be implemented, so as to morecomprehensively implement assistance; in the processing of the endoscopeimage, the smoothness of a video frame rate can be ensured, for example,an average of less than 150 milliseconds per frame.

Through the execution process stated above, the alimentary canalendoscope diagnosis system directly applied to a hospital productionenvironment is obtained, and under the current situation of scarce anduneven medical resources, the system can assist the doctor to localizeand discover the alimentary canal lesion, and prevent misdiagnosis.

Apparatus embodiments of this disclosure are described below, and can beused to perform the embodiments of the foregoing medical endoscope imagerecognition method of this disclosure. For details not disclosed in theapparatus embodiments of this disclosure, refer to the embodiments ofthe medical endoscope image recognition method of this disclosure.

FIG. 15 is a block diagram of a medical endoscope image recognitionsystem according to an exemplary embodiment. In an exemplary embodiment,as shown in FIG. 15, the physical medical endoscope image recognitionsystem includes, but is not limited to: an image obtaining module 1010,an image filtering module 1030, an organ part recognition module 1050,an image type identification module 1070, and a detail identificationmodule 1090. One or more modules, submodules of the apparatus can beimplemented by processing circuitry, software, or a combination thereof,for example.

The image obtaining module 1010 is configured to obtain originalendoscope images according to a medical endoscope video stream.

The image filtering module 1030 is configured to filter the originalendoscope images by using a neural network, to generate target endoscopeimages.

The organ part recognition module 1050 is configured to recognize organinformation corresponding to the target endoscope images by using theneural network.

The image type identification module 1070 is configured to identify animage type suitable for the target endoscope images according to thecorresponding organ information by using a classification network.

Further, the detail identification module 1090 is configured to localizea lesion region in each of the target endoscope images according to apart indicated by the organ information, and identify a lesion categoryof the lesion region in a photographing mode corresponding to the imagetype.

In some embodiments, this disclosure further provides a machine device.The machine device may be applied to the implementation environment inFIG. 1, to perform all or some of the steps in the method shown in anyone of FIG. 3, FIG. 5, FIG. 6, FIG. 8, FIG. 9, and FIG. 10. Theapparatus can include a processor and a memory. The memory is configuredto store processor executable instructions. The processor is configuredto implement one or more of the foregoing methods.

Exemplary implementations of operations performed by the processor ofthe apparatus in this embodiment are described in detail in theforegoing embodiments. Details are not described herein.

It is to be understood that this disclosure is not limited to theprecise structures described above and shown in the accompanyingdrawings, and various modifications and changes may be made withoutdeparting from the scope of this disclosure.

What is claimed is:
 1. A medical endoscope image recognition method,comprising: receiving endoscope images from a medical endoscope;filtering the endoscope images with a neural network, to obtain targetendoscope images; recognizing organ information corresponding to thetarget endoscope images via the neural network; identifying an imagingtype of the target endoscope images according to the corresponding organinformation with a classification network; localizing a lesion region inthe target endoscope images according to an organ part indicated by theorgan information; and identifying, by processing circuitry, a lesioncategory of the lesion region in an image capture mode of the medicalendoscope corresponding to the imaging type.
 2. The method according toclaim 1, wherein the endoscope images are captured by the medicalendoscope in a tract connected to outside a body or a sealed bodycavity.
 3. The method according to claim 1, wherein the receivingcomprises: receiving the endoscope images while the medical endoscope isbeing manipulated and capturing the endoscope images inside a body. 4.The method according to claim 1, wherein the receiving comprises:receiving the endoscope images of a previously stored medical endoscopevideo stream, at least one of the endoscope images being used torecognize the lesion region in a tract or a sealed body cavity capturedby the medical endoscope and identify the lesion category of the lesionregion.
 5. The method according to claim 1, wherein the filteringcomprises: processing the endoscope images according to a set size togenerate standard endoscope images; determining whether the standardendoscope images are low-quality images or non-low-quality images viathe neural network; and filtering out the standard endoscope images thatare low-quality images, to obtain the target endoscope images.
 6. Themethod according to claim 1, wherein the localizing comprises: detectingforeign matter in each of the target endoscope images in the imagecapture mode corresponding to the imaging type, to obtain a foreignmatter frame in the respective target endoscope image, the foreignmatter frame indicating a region having the foreign matter in therespective target endoscope image; filtering the target endoscope imagesaccording to the foreign matter frame; and localizing the lesion regionwith the filtered target endoscope images.
 7. The method according toclaim 6, wherein before the detecting the foreign matter, the localizingfurther comprises: detecting an imaging type of a target endoscope imageof the target endoscope images; and switching an image capture modecorresponding to the target endoscope image according to the imagecapture mode corresponding to the identified imaging type when theimaging type is different from the identified imaging type, to obtainthe target endoscope image in the image capture mode corresponding tothe imaging type.
 8. The method according to claim 6, wherein thedetecting the foreign matter comprises: inputting the target endoscopeimages in the image capture mode corresponding to the imaging type intothe neural network, performing target detection via the neural network,and outputting coordinates and confidence levels that correspond to theforeign matter frames, the coordinates indicating positions of theforeign matter frames in the target endoscope images.
 9. The methodaccording to claim 6, wherein the filtering the target endoscope imagesaccording to the foreign matter frame comprises: determining an areaproportion factor of an area occupied by the foreign matter in each ofthe target endoscope images according to coordinates and a confidencelevel corresponding to the foreign matter frame in the respective targetendoscope image; determining whether the foreign matter interferes withthe respective target endoscope image according to the area proportionfactor; and filtering out the target endoscope images with the foreignmatter interference.
 10. The method according to claim 1, wherein thelocalizing includes performing continuous feature extraction of thetarget endoscope images in the image capture mode corresponding to theimaging type by using each layer of a localization detection network,until the lesion region in the target endoscope images is obtainedthrough regression; and the identifying the lesion category includesclassifying a lesion property of the lesion region in the targetendoscope images with the classification network, to obtain the lesioncategory of the lesion region of the target endoscope images.
 11. Themethod according to claim 10, wherein the classifying comprises:extending the lesion region in a respective target endoscope image ofthe target endoscope images, to obtain an extended region correspondingto the lesion region of the respective target endoscope image;pre-processing the extended region to normalize the extended region intoa classification network input image meeting an input size; andperforming network prediction on a lesion category of the input imagevia the classification network, to obtain the lesion category of thecorresponding lesion region in the respective target endoscope image.12. The method according to claim 1, further comprising: training aneural network by using low-quality images and non-low-quality imagescaptured by a reference endoscope as samples, to obtain the neuralnetwork corresponding to a low-quality image category output probabilityand a non-low quality image category output probability, the neuralnetwork being configured to obtain the target endoscope images.
 13. Themethod according to claim 12, wherein the training the neural networkcomprises: adjusting the low-quality images and the non-low-qualityimages captured by the reference endoscope as the samples to a fixedsize; performing data enhancement on the low-quality images and thenon-low-quality images that have been adjusted to the fixed size, toobtain adjusted sample images meeting an input size of the neuralnetwork; and executing a network training process for the neural networkby using the adjusted sample images as inputs.
 14. The method accordingto claim 1, wherein recognition of information about the organ partcorresponding to the target endoscope images is executed by aclassification network associated with the organ part, and theclassification network is obtained by training with sample endoscopeimages in which the organ part is annotated.
 15. A medical endoscopeimage recognition system, comprising: processing circuitry configuredto: receive endoscope images from a medical endoscope; filter theendoscope images with a neural network, to obtain target endoscopeimages; recognize organ information corresponding to the targetendoscope images via the neural network; identify an imaging type of thetarget endoscope images according to the corresponding organ informationwith a classification network; localize a lesion region in the targetendoscope images according to an organ part indicated by the organinformation; and identify a lesion category of the lesion region in animage capture mode of the medical endoscope corresponding to the imagingtype.
 16. The system according to claim 15, wherein the endoscope imagesare captured by the medical endoscope in a tract connected to outside abody or a sealed body cavity.
 17. The system according to claim 15,wherein the processing circuitry is configured to: receive the endoscopeimages while the medical endoscope is being manipulated and capturingthe endoscope images inside a body.
 18. The system according to claim15, wherein the processing circuitry is configured to: receive theendoscope images of a previously stored medical endoscope video stream,at least one of the endoscope images being used to recognize the lesionregion in a tract or a sealed body cavity captured by the medicalendoscope and identify the lesion category of the lesion region.
 19. Anon-transitory computer-readable storage medium, storing instructionswhich when executed by a processor cause the processor to perform:receiving endoscope images from a medical endoscope; filtering theendoscope images with a neural network, to obtain target endoscopeimages; recognizing organ information corresponding to the targetendoscope images via the neural network; identifying an imaging type ofthe target endoscope images according to the corresponding organinformation with a classification network; localizing a lesion region inthe target endoscope images according to an organ part indicated by theorgan information; and identifying a lesion category of the lesionregion in an image capture mode of the medical endoscope correspondingto the imaging type.
 20. An endoscopic imaging system, comprising: themedical endoscope image recognition system according to claim 15; and adisplay device configured to display the endoscope images.